156 research outputs found

    Fast Preprocessing for Robust Face Sketch Synthesis

    Full text link
    Exemplar-based face sketch synthesis methods usually meet the challenging problem that input photos are captured in different lighting conditions from training photos. The critical step causing the failure is the search of similar patch candidates for an input photo patch. Conventional illumination invariant patch distances are adopted rather than directly relying on pixel intensity difference, but they will fail when local contrast within a patch changes. In this paper, we propose a fast preprocessing method named Bidirectional Luminance Remapping (BLR), which interactively adjust the lighting of training and input photos. Our method can be directly integrated into state-of-the-art exemplar-based methods to improve their robustness with ignorable computational cost.Comment: IJCAI 2017. Project page: http://www.cs.cityu.edu.hk/~yibisong/ijcai17_sketch/index.htm

    Learning to Hallucinate Face Images via Component Generation and Enhancement

    Full text link
    We propose a two-stage method for face hallucination. First, we generate facial components of the input image using CNNs. These components represent the basic facial structures. Second, we synthesize fine-grained facial structures from high resolution training images. The details of these structures are transferred into facial components for enhancement. Therefore, we generate facial components to approximate ground truth global appearance in the first stage and enhance them through recovering details in the second stage. The experiments demonstrate that our method performs favorably against state-of-the-art methodsComment: IJCAI 2017. Project page: http://www.cs.cityu.edu.hk/~yibisong/ijcai17_sr/index.htm

    Stylizing Face Images via Multiple Exemplars

    Full text link
    We address the problem of transferring the style of a headshot photo to face images. Existing methods using a single exemplar lead to inaccurate results when the exemplar does not contain sufficient stylized facial components for a given photo. In this work, we propose an algorithm to stylize face images using multiple exemplars containing different subjects in the same style. Patch correspondences between an input photo and multiple exemplars are established using a Markov Random Field (MRF), which enables accurate local energy transfer via Laplacian stacks. As image patches from multiple exemplars are used, the boundaries of facial components on the target image are inevitably inconsistent. The artifacts are removed by a post-processing step using an edge-preserving filter. Experimental results show that the proposed algorithm consistently produces visually pleasing results.Comment: In CVIU 2017. Project Page: http://www.cs.cityu.edu.hk/~yibisong/cviu17/index.htm

    Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint

    Full text link
    GAN inversion and editing via StyleGAN maps an input image into the embedding spaces (W\mathcal{W}, W+\mathcal{W^+}, and F\mathcal{F}) to simultaneously maintain image fidelity and meaningful manipulation. From latent space W\mathcal{W} to extended latent space W+\mathcal{W^+} to feature space F\mathcal{F} in StyleGAN, the editability of GAN inversion decreases while its reconstruction quality increases. Recent GAN inversion methods typically explore W+\mathcal{W^+} and F\mathcal{F} rather than W\mathcal{W} to improve reconstruction fidelity while maintaining editability. As W+\mathcal{W^+} and F\mathcal{F} are derived from W\mathcal{W} that is essentially the foundation latent space of StyleGAN, these GAN inversion methods focusing on W+\mathcal{W^+} and F\mathcal{F} spaces could be improved by stepping back to W\mathcal{W}. In this work, we propose to first obtain the precise latent code in foundation latent space W\mathcal{W}. We introduce contrastive learning to align W\mathcal{W} and the image space for precise latent code discovery. %The obtaining process is by using contrastive learning to align W\mathcal{W} and the image space. Then, we leverage a cross-attention encoder to transform the obtained latent code in W\mathcal{W} into W+\mathcal{W^+} and F\mathcal{F}, accordingly. Our experiments show that our exploration of the foundation latent space W\mathcal{W} improves the representation ability of latent codes in W+\mathcal{W^+} and features in F\mathcal{F}, which yields state-of-the-art reconstruction fidelity and editability results on the standard benchmarks. Project page: \url{https://github.com/KumapowerLIU/CLCAE}

    DiffusionDet: Diffusion Model for Object Detection

    Full text link
    We propose DiffusionDet, a new framework that formulates object detection as a denoising diffusion process from noisy boxes to object boxes. During the training stage, object boxes diffuse from ground-truth boxes to random distribution, and the model learns to reverse this noising process. In inference, the model refines a set of randomly generated boxes to the output results in a progressive way. Our work possesses an appealing property of flexibility, which enables the dynamic number of boxes and iterative evaluation. The extensive experiments on the standard benchmarks show that DiffusionDet achieves favorable performance compared to previous well-established detectors. For example, DiffusionDet achieves 5.3 AP and 4.8 AP gains when evaluated with more boxes and iteration steps, under a zero-shot transfer setting from COCO to CrowdHuman. Our code is available at https://github.com/ShoufaChen/DiffusionDet.Comment: ICCV2023 (Oral), Camera-read
    • …
    corecore